Segmenting Consumers of Bath Soap

library(tidyverse)
library(factoextra)
library(ISLR)
library(GGally)
library(viridis)
library(dplyr)
library(ggplot2)
setwd("~/Desktop")
BathSoap <- read.csv("BathSoap.csv")
BathSoap <- data.frame(lapply(BathSoap, function(x) as.numeric(sub("%", "", x))))
BathSoap <- BathSoap %>% mutate_at(c(20:46), funs(./100)) %>% mutate_at(c(20:46), funs(.*BathSoap$Total.Volume))
head(BathSoap)
##   Member.id SEC FEH MT SEX AGE EDU HS CHILD CS Affluence.Index
## 1   1010010   4   3 10   1   4   4  2     4  1               2
## 2   1010020   3   2 10   2   2   4  4     2  1              19
## 3   1014020   2   3 10   2   4   5  6     4  1              23
## 4   1014030   4   0  0   0   4   0  0     5  0               0
## 5   1014190   4   1 10   2   3   4  4     3  1              10
## 6   1017020   4   3 10   2   3   4  5     2  1              13
##   No..of.Brands Brand.Runs Total.Volume No..of..Trans  Value
## 1             3         17         8025            24  818.0
## 2             5         25        13975            40 1681.5
## 3             5         37        23100            63 1950.0
## 4             2          4         1500             4  114.0
## 5             3          6         8300            13  591.0
## 6             3         26        18175            41 1705.5
##   Trans...Brand.Runs Vol.Tran Avg..Price Pur.Vol.No.Promo....
## 1               1.41   334.38      10.19              8025.00
## 2               1.60   349.38      12.03             12437.75
## 3               1.70   366.67       8.44             21714.00
## 4               1.00   375.00       7.60              1500.00
## 5               2.17   638.46       7.12              5063.00
## 6               1.58   443.29       9.38             18175.00
##   Pur.Vol.Promo.6.. Pur.Vol.Other.Promo.. Br..Cd..57..144 Br..Cd..55
## 1               0.0                   0.0          3049.5    1043.25
## 2            1397.5                 279.5           279.5    1118.00
## 3             462.0                 924.0           693.0   12705.00
## 4               0.0                   0.0           600.0     900.00
## 5            1162.0                1992.0           415.0    1162.00
## 6               0.0                   0.0          1454.0    1272.25
##   Br..Cd..272 Br..Cd..286 Br..Cd..24 Br..Cd..481 Br..Cd..352 Br..Cd..5
## 1           0           0          0         0.0           0       0.0
## 2           0           0          0       838.5           0    1956.5
## 3           0         693          0         0.0           0     462.0
## 4           0           0          0         0.0           0       0.0
## 5           0           0          0         0.0           0       0.0
## 6           0           0          0         0.0           0       0.0
##   Others.999 Pr.Cat.1 Pr.Cat.2 Pr.Cat.3 Pr.Cat.4 PropCat.5 PropCat.6
## 1   3948.300  1845.75  4494.00  1043.25   561.75   4012.50      0.00
## 2   9768.525  4052.75  7686.25  1257.75   838.50   6428.50   4891.25
## 3   8754.900  2772.00  7392.00 12936.00     0.00   5544.00   2772.00
## 4      0.000     0.00   600.00   900.00     0.00    600.00      0.00
## 5   6698.100     0.00   415.00  1162.00  6723.00   6723.00      0.00
## 6  15575.975  3998.50  8178.75  1272.25  4907.25   8905.75   1817.50
##   PropCat.7 PropCat.8 PropCat.9 PropCat.10 PropCat.11 PropCat.12
## 1      0.00      0.00      0.00          0        0.0     240.75
## 2    419.25    279.50    139.75          0      838.5       0.00
## 3    693.00    231.00    231.00          0        0.0     462.00
## 4      0.00      0.00      0.00          0        0.0       0.00
## 5      0.00    415.00      0.00          0        0.0       0.00
## 6      0.00    181.75   1272.25          0        0.0       0.00
##   PropCat.13 PropCat.14 PropCat.15
## 1          0    1043.25    2728.50
## 2          0    1118.00       0.00
## 3          0   12936.00       0.00
## 4          0     900.00       0.00
## 5          0    1162.00       0.00
## 6          0    1272.25    4907.25
BS <- scale(BathSoap[,-c(1:11)])
BS <- cbind(BathSoap[,1:11],BS)
head(BS)
##   Member.id SEC FEH MT SEX AGE EDU HS CHILD CS Affluence.Index
## 1   1010010   4   3 10   1   4   4  2     4  1               2
## 2   1010020   3   2 10   2   2   4  4     2  1              19
## 3   1014020   2   3 10   2   4   5  6     4  1              23
## 4   1014030   4   0  0   0   4   0  0     5  0               0
## 5   1014190   4   1 10   2   3   4  4     3  1              10
## 6   1017020   4   3 10   2   3   4  5     2  1              13
##   No..of.Brands Brand.Runs Total.Volume No..of..Trans      Value
## 1    -0.4030277  0.1200727   -0.5005898    -0.4104681 -0.5881031
## 2     0.8630280  0.8895639    0.2651391     0.5076339  0.3896410
## 3     0.8630280  2.0438006    1.4394712     1.8274054  0.6936645
## 4    -1.0360556 -1.1303505   -1.3403176    -1.5580955 -1.3852447
## 5    -0.4030277 -0.9379777   -0.4651989    -1.0416632 -0.8451360
## 6    -0.4030277  0.9857502    0.8056536     0.5650152  0.4168163
##   Trans...Brand.Runs   Vol.Tran  Avg..Price Pur.Vol.No.Promo....
## 1         -0.4636969 -0.3242918 -0.43944366           -0.3943558
## 2         -0.3907514 -0.2639930  0.05217678            0.1983882
## 3         -0.3523590 -0.1944886 -0.90701745            1.4444233
## 4         -0.6211057 -0.1610026 -1.13145287           -1.2708284
## 5         -0.1719147  0.8980852 -1.25970168           -0.7922274
## 6         -0.3984298  0.1135176 -0.65586353            0.9690461
##   Pur.Vol.Promo.6.. Pur.Vol.Other.Promo.. Br..Cd..57..144 Br..Cd..55
## 1        -0.5574329            -0.5116939       0.2226321 -0.1754393
## 2         0.7686897            -0.1252184      -0.4805832 -0.1578175
## 3        -0.1190296             0.7659568      -0.3756086  2.5737501
## 4        -0.5574329            -0.5116939      -0.3992184 -0.2092097
## 5         0.5452179             2.2427218      -0.4461840 -0.1474447
## 6        -0.5574329            -0.5116939      -0.1824148 -0.1214539
##   Br..Cd..272 Br..Cd..286 Br..Cd..24 Br..Cd..481 Br..Cd..352  Br..Cd..5
## 1  -0.3448178  -0.2253619 -0.2371511  -0.2538919  -0.2549821 -0.2929379
## 2  -0.3448178  -0.2253619 -0.2371511   0.4047639  -0.2549821  2.6176765
## 3  -0.3448178   0.1428499 -0.2371511  -0.2538919  -0.2549821  0.3943628
## 4  -0.3448178  -0.2253619 -0.2371511  -0.2538919  -0.2549821 -0.2929379
## 5  -0.3448178  -0.2253619 -0.2371511  -0.2538919  -0.2549821 -0.2929379
## 6  -0.3448178  -0.2253619 -0.2371511  -0.2538919  -0.2549821 -0.2929379
##   Others.999      Pr.Cat.1   Pr.Cat.2   Pr.Cat.3   Pr.Cat.4    PropCat.5
## 1 -0.3870628 -0.2794416143 -0.2520866 -0.1988988 -0.1925159 -0.246319991
## 2  0.6565013  0.3854838369  0.2625352 -0.1496659 -0.1056810  0.141979507
## 3  0.4747587 -0.0003808082  0.2150992  2.5307700 -0.3687742 -0.000177325
## 4 -1.0949914 -0.8355295851 -0.8798374 -0.2317780 -0.3687742 -0.794776958
## 5  0.1059753 -0.8355295851 -0.9096612 -0.1716428  1.7406778  0.189311544
## 6  1.6977747  0.3691393848  0.3419310 -0.1463378  1.1709563  0.540123103
##    PropCat.6  PropCat.7  PropCat.8  PropCat.9 PropCat.10 PropCat.11
## 1 -0.4980207 -0.4173594 -0.5069920 -0.4317488 -0.2850019 -0.2651424
## 2  1.6247065 -0.2567103 -0.3267324 -0.2787491 -0.2850019  0.3729320
## 3  0.7049846 -0.1518142 -0.3580118 -0.1788477 -0.2850019 -0.2651424
## 4 -0.4980207 -0.4173594 -0.5069920 -0.4317488 -0.2850019 -0.2651424
## 5 -0.4980207 -0.4173594 -0.2393435 -0.4317488 -0.2850019 -0.2651424
## 6  0.2907463 -0.4173594 -0.3897749  0.9611232 -0.2850019 -0.2651424
##   PropCat.12 PropCat.13 PropCat.14 PropCat.15
## 1  0.8905060 -0.2536688 -0.1912455  2.1061267
## 2 -0.2907978 -0.2536688 -0.1739337 -0.2505867
## 3  1.9761280 -0.2536688  2.5630697 -0.2505867
## 4 -0.2907978 -0.2536688 -0.2244217 -0.2505867
## 5 -0.2907978 -0.2536688 -0.1637435 -0.2505867
## 6 -0.2907978 -0.2536688 -0.1382100  3.9879993

Identify clusters of households based on Purchase behavior

fviz_nbclust(BS[,c(12:31)], kmeans, method = "wss")

fviz_nbclust(BS[,c(12:31)], kmeans, method = "silhouette")

set.seed(120)
pk2 <- kmeans(BS[,c(12:31)], centers = 2, nstart = 25)
pk2
## K-means clustering with 2 clusters of sizes 148, 452
## 
## Cluster means:
##   No..of.Brands Brand.Runs Total.Volume No..of..Trans     Value
## 1     0.3839799  0.5256153    1.3353247     0.7871742  1.176034
## 2    -0.1257279 -0.1721041   -0.4372302    -0.2577473 -0.385073
##   Trans...Brand.Runs   Vol.Tran  Avg..Price Pur.Vol.No.Promo....
## 1         0.18290360  0.7673642 -0.27557620            1.2822270
## 2        -0.05988879 -0.2512609  0.09023291           -0.4198442
##   Pur.Vol.Promo.6.. Pur.Vol.Other.Promo.. Br..Cd..57..144 Br..Cd..55
## 1         0.4821548             0.4661500       0.3593261  0.4911850
## 2        -0.1578737            -0.1526332      -0.1176554 -0.1608305
##   Br..Cd..272 Br..Cd..286  Br..Cd..24 Br..Cd..481 Br..Cd..352   Br..Cd..5
## 1  0.16526060  0.20168930  0.13563777  0.30345639  0.08320394  0.24331419
## 2 -0.05411188 -0.06603986 -0.04441237 -0.09936183 -0.02724377 -0.07966925
##   Others.999
## 1  0.9988818
## 2 -0.3270675
## 
## Clustering vector:
##   [1] 2 1 1 2 2 1 2 2 1 2 1 2 1 2 2 2 1 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 1 1
##  [36] 2 2 2 2 2 2 2 1 2 1 2 2 2 2 1 1 2 2 1 2 1 1 1 1 2 1 1 1 2 2 2 1 1 2 2
##  [71] 2 2 2 2 2 2 2 2 1 2 2 1 2 2 2 1 2 1 2 2 2 1 2 2 2 2 1 2 2 2 2 2 1 1 2
## [106] 1 1 2 2 2 2 2 2 1 1 2 1 2 1 2 1 1 2 1 1 1 1 2 2 1 1 1 2 2 2 2 2 2 2 1
## [141] 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 1 2 1 2 2 2 2 2 1
## [176] 2 1 2 1 2 2 1 2 2 1 2 1 1 1 2 2 1 2 2 1 2 2 2 1 2 2 1 1 2 2 2 2 1 2 1
## [211] 2 2 2 1 1 2 2 2 2 2 2 1 1 2 2 2 2 1 2 2 2 1 2 1 2 1 1 2 1 2 2 2 2 2 2
## [246] 2 2 2 2 2 2 2 2 1 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 2 1 1 2 2 2 2 1 2 2 2
## [281] 1 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1
## [316] 1 2 2 1 2 1 2 1 1 2 2 1 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 1 2 2 2
## [351] 2 2 2 2 2 2 1 2 2 1 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 1
## [386] 2 2 2 2 2 1 2 2 2 1 2 2 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2
## [421] 1 2 2 2 2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [456] 2 2 1 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 1 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2
## [491] 2 2 2 2 1 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2
## [526] 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 1 2 2 2 1 2 2 2 2 2 2
## [561] 2 2 2 2 1 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2
## [596] 2 2 1 2 2
## 
## Within cluster sum of squares by cluster:
## [1] 5379.942 4907.679
##  (between_SS / total_SS =  14.1 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"
fviz_cluster(pk2, data = BS)

Identify clusters of households based on Basis of purchase

fviz_nbclust(BS[,c(32:46)], kmeans, method = "wss")

fviz_nbclust(BS[,c(32:46)], kmeans, method = "silhouette")

set.seed(120)
bk4 <- kmeans(BS[,c(32:46)], centers = 4, nstart = 25)
bk4
## K-means clustering with 4 clusters of sizes 99, 48, 56, 397
## 
## Cluster means:
##     Pr.Cat.1   Pr.Cat.2   Pr.Cat.3    Pr.Cat.4  PropCat.5   PropCat.6
## 1  0.1661731  1.1324062 -0.1806369  0.77916278  1.4390664 -0.01285879
## 2  1.8559024  0.1231322 -0.3602898 -0.24398568 -0.2408319  0.97459347
## 3 -0.5683449 -0.5400881  2.6903448 -0.08367702 -0.5188230  0.07317085
## 4 -0.1856603 -0.2210923 -0.2908875 -0.15299720 -0.2565581 -0.12494971
##     PropCat.7   PropCat.8   PropCat.9  PropCat.10 PropCat.11 PropCat.12
## 1 -0.09915360  0.07733577  0.93989828 -0.09869787  0.3519956 -0.1030397
## 2  1.15236127  0.02499432  0.05503680  1.09157261 -0.0161106  1.8006345
## 3 -0.34781908 -0.40801906 -0.05306282 -0.25297241 -0.1928943 -0.1256151
## 4 -0.06553971  0.03524710 -0.23355209 -0.07168247 -0.0586201 -0.1742949
##   PropCat.13 PropCat.14  PropCat.15
## 1 -0.2171882 -0.1806510  0.19702711
## 2  1.5052363 -0.3567717 -0.02484209
## 3 -0.1748551  2.6852954 -0.15972750
## 4 -0.1031683 -0.2905971 -0.02359830
## 
## Clustering vector:
##   [1] 4 4 3 4 4 1 4 3 1 4 1 4 1 4 4 4 1 4 1 4 4 3 3 4 4 2 4 1 4 4 3 4 4 3 3
##  [36] 4 4 4 3 4 4 3 3 4 1 4 4 4 4 2 3 1 4 1 3 1 3 1 3 4 4 3 3 4 4 3 3 1 4 4
##  [71] 4 4 3 4 4 4 4 4 2 4 4 3 3 4 4 1 4 1 4 3 2 3 3 4 2 4 3 4 4 4 4 4 1 4 1
## [106] 1 2 4 4 4 3 1 4 1 1 4 4 3 1 4 1 2 1 2 1 3 1 4 4 2 1 2 4 4 4 2 4 4 4 1
## [141] 1 3 4 3 2 3 4 4 4 4 4 4 4 3 4 4 4 4 4 3 3 3 1 2 1 4 1 4 1 4 1 4 4 3 1
## [176] 4 1 3 3 1 4 1 2 4 1 1 1 1 1 4 1 2 4 4 4 4 4 4 2 4 1 1 3 4 3 4 4 1 4 1
## [211] 4 4 4 1 1 4 4 4 3 2 4 1 3 4 4 4 4 4 4 4 3 1 3 2 4 3 3 3 3 4 4 4 4 4 3
## [246] 4 4 4 4 4 4 4 4 1 1 4 1 3 1 4 4 4 4 1 1 4 4 4 4 4 1 1 4 4 4 4 1 4 4 4
## [281] 1 1 4 1 1 4 4 4 4 2 4 4 4 4 4 1 3 1 1 2 4 4 4 2 4 4 4 4 4 2 4 1 1 2 1
## [316] 1 4 4 4 4 3 4 1 3 4 4 2 4 4 4 1 4 4 4 4 4 1 4 4 4 4 4 1 4 4 4 4 4 4 4
## [351] 4 4 2 4 4 4 2 4 2 4 4 4 1 2 2 4 4 4 4 4 4 4 4 4 3 4 4 4 1 4 4 4 1 4 4
## [386] 4 4 1 4 4 2 4 4 4 4 4 4 4 4 4 4 2 1 2 4 4 4 4 2 4 4 4 2 4 4 4 4 4 4 4
## [421] 1 4 4 4 4 4 4 4 4 4 1 1 1 4 2 4 4 4 4 4 4 4 4 2 4 4 4 4 4 4 4 4 4 2 4
## [456] 4 1 2 1 4 4 4 4 2 1 3 4 4 4 4 4 4 4 1 4 1 1 4 4 2 4 4 4 2 4 4 4 4 4 4
## [491] 4 4 4 4 2 2 4 4 1 4 3 4 4 4 4 4 4 4 4 4 4 4 4 2 4 4 4 4 4 4 4 4 4 1 4
## [526] 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 4 1 4 4 4 4 4 4 4 4 4 4 1 4 4 2
## [561] 4 1 4 4 4 4 4 4 4 4 4 4 4 4 2 4 4 4 4 4 4 4 4 4 4 4 2 4 4 1 4 4 4 4 4
## [596] 4 4 1 4 4
## 
## Within cluster sum of squares by cluster:
## [1] 2252.4953 1891.6807  410.5978 2231.0908
##  (between_SS / total_SS =  24.5 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"
fviz_cluster(bk4, data = BS)

Purchase behavior and Basis of purchase, k = 3

fviz_nbclust(BS[,c(12:46)], kmeans, method = "wss")

fviz_nbclust(BS[,c(12:46)], kmeans, method = "silhouette")

set.seed(120)
k3 <- kmeans(BS[,c(12:46)], centers = 3, nstart = 25)
k3
## K-means clustering with 3 clusters of sizes 424, 54, 122
## 
## Cluster means:
##   No..of.Brands Brand.Runs Total.Volume No..of..Trans      Value
## 1   -0.09248577 -0.1321898   -0.4656020   -0.25848864 -0.3708717
## 2   -0.35613679 -0.5924191    0.7172361   -0.01198633 -0.1557435
## 3    0.47906028  0.7216319    1.3006927    0.90365940  1.3578667
##   Trans...Brand.Runs   Vol.Tran  Avg..Price Pur.Vol.No.Promo....
## 1        -0.15865849 -0.2872994  0.18397336           -0.4467837
## 2         1.35361406  0.7513708 -1.35875806            0.6967571
## 3        -0.04773737  0.6659092 -0.03796531            1.2443557
##   Pur.Vol.Promo.6.. Pur.Vol.Other.Promo.. Br..Cd..57..144 Br..Cd..55
## 1       -0.14768724            -0.1952011     -0.09251371 -0.2952935
## 2       -0.05519063             0.6157772     -0.44555219  2.7091279
## 3        0.53770232             0.4058468      0.51873469 -0.1728562
##   Br..Cd..272 Br..Cd..286  Br..Cd..24 Br..Cd..481 Br..Cd..352   Br..Cd..5
## 1 -0.03609944 -0.05447222 -0.03827278  -0.0913534 -0.01319092 -0.05926465
## 2 -0.27633726 -0.17052912 -0.13783678  -0.1765512 -0.21847185 -0.17453774
## 3  0.24777356  0.26479340  0.19402330   0.3956361  0.14254450  0.28322337
##   Others.999   Pr.Cat.1   Pr.Cat.2   Pr.Cat.3   Pr.Cat.4  PropCat.5
## 1 -0.2982449 -0.1419520 -0.2328618 -0.2977887 -0.1273257 -0.2135347
## 2 -0.4971270 -0.5788665 -0.6174472  2.7013045 -0.1323849 -0.5807933
## 3  1.2565630  0.7495611  1.0825866 -0.1607215  0.5011054  0.9991930
##    PropCat.6   PropCat.7   PropCat.8   PropCat.9  PropCat.10  PropCat.11
## 1 -0.1500091 -0.06716428 -0.05191804 -0.15715382 -0.01087804 -0.09104144
## 2 -0.0746225 -0.35998537 -0.41496499 -0.05220554 -0.25178613 -0.19021844
## 3  0.5543727  0.39276120  0.36410949  0.56928130  0.14925196  0.40060137
##   PropCat.12  PropCat.13 PropCat.14  PropCat.15
## 1 -0.1015529 -0.03311153 -0.2972731 -0.07129654
## 2 -0.1194972 -0.17193607  2.7170638 -0.15636234
## 3  0.4058302  0.19117899 -0.1694891  0.31699425
## 
## Clustering vector:
##   [1] 1 1 2 1 1 3 1 2 3 1 3 1 3 1 1 1 3 1 1 1 2 2 2 1 1 3 1 1 1 1 2 1 1 2 2
##  [36] 1 1 1 2 1 1 2 2 1 3 1 1 1 1 3 2 1 1 3 2 3 2 3 2 1 3 2 2 1 1 2 2 3 1 1
##  [71] 1 1 2 1 1 1 1 1 3 1 1 2 2 1 1 3 1 3 1 2 1 2 2 1 1 1 2 1 1 1 1 1 3 3 1
## [106] 3 3 1 1 1 2 1 1 3 3 1 3 2 3 1 3 3 1 3 3 2 3 1 1 3 3 3 1 1 2 1 1 1 1 3
## [141] 1 2 1 1 3 2 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 2 3 3 3 1 3 1 3 1 1 1 1 2 3
## [176] 1 3 2 2 1 1 3 1 1 3 1 3 3 3 1 1 3 1 1 1 1 1 1 3 1 1 3 3 1 2 1 1 3 1 3
## [211] 1 1 1 3 3 1 1 1 2 1 1 3 2 1 1 1 1 3 1 1 2 3 2 3 1 2 2 2 2 1 1 1 1 1 2
## [246] 1 1 1 1 1 1 1 1 3 1 1 1 2 3 1 1 1 1 3 3 1 1 1 1 1 3 3 1 1 1 1 3 1 1 1
## [281] 3 3 1 3 3 1 1 1 1 1 1 1 1 1 1 1 2 1 3 1 1 1 1 3 1 1 1 1 1 1 1 3 3 1 3
## [316] 3 1 1 3 1 2 1 3 3 1 1 3 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 3 1 1 1
## [351] 1 1 1 1 1 1 3 1 1 3 1 1 3 3 3 1 1 1 1 1 1 1 1 1 2 1 1 1 3 1 1 1 3 1 3
## [386] 1 1 1 1 1 3 1 1 1 3 1 1 1 1 1 1 3 3 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1
## [421] 3 1 1 1 1 3 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [456] 1 1 3 3 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 3 3 1 1 1 3 1 1 1 1 1 1 1 1 1 1
## [491] 1 1 1 1 3 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 3 1
## [526] 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 3 3 3 1 1 1 3 1 1 1 3 1 1 1 1 1 1
## [561] 1 1 1 1 3 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1
## [596] 1 1 3 1 1
## 
## Within cluster sum of squares by cluster:
## [1] 7224.622 1215.646 8528.354
##  (between_SS / total_SS =  19.1 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"
fviz_cluster(k3, data = BS)

k3center <- as.data.frame(k3$centers)
colnames(k3center) <- c("No.Br","Br.Runs","Total.Vol","No.Trans","Value","Trans.Br.Runs","Vol.Tran","Avg.Price","PVNP","PVP6","PVOP","Br.57.144","Br.55","Br.272","Br.286","Br.24","Br.481","Br.352","Br.5","Br.999","Pt1","Pt2","Pt3","Pt4","Pt5","Pt6","Pt7","Pt8","Pt9","Pt10","Pt11","Pt12","Pt13","Pt14","Pt15")
cluster <- matrix(c("1","2","3"),nrow = 3)
k3center <- cbind(cluster,k3center)

Purchase summary over the period, k = 3

ggparcoord(k3center, columns = 2:9, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Purchase within promotion, k = 3

ggparcoord(k3center, columns = 10:12, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Brandwise purchase, k = 3

ggparcoord(k3center, columns = 13:21, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Price categorywise purchase, k = 3

ggparcoord(k3center, columns = 22:25, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Selling propositionwise purchase, k = 3

ggparcoord(k3center, columns = 26:36, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Purchase behavior and Basis of purchase, k = 2

set.seed(120)
k2 <- kmeans(BS[,c(12:46)], centers = 2, nstart = 25)
k2
## K-means clustering with 2 clusters of sizes 466, 134
## 
## Cluster means:
##   No..of.Brands Brand.Runs Total.Volume No..of..Trans      Value
## 1    -0.1449263 -0.2041952   -0.3847868    -0.2643057 -0.3750085
## 2     0.5039973  0.7101116    1.3381391     0.9191525  1.3041341
##   Trans...Brand.Runs   Vol.Tran  Avg..Price Pur.Vol.No.Promo....
## 1       -0.001571150 -0.1865315  0.03398412           -0.3661046
## 2        0.005463849  0.6486840 -0.11818358            1.2731696
##   Pur.Vol.Promo.6.. Pur.Vol.Other.Promo.. Br..Cd..57..144  Br..Cd..55
## 1        -0.1679486            -0.1283127      -0.1297062 -0.03331724
## 2         0.5840601             0.4462216       0.4510677  0.11586443
##   Br..Cd..272 Br..Cd..286  Br..Cd..24 Br..Cd..481 Br..Cd..352   Br..Cd..5
## 1 -0.06196977 -0.06752008 -0.05093196  -0.1011599  -0.0336199 -0.07533559
## 2  0.21550681  0.23480864  0.17712160   0.3517949   0.1169170  0.26198795
##   Others.999   Pr.Cat.1   Pr.Cat.2    Pr.Cat.3   Pr.Cat.4  PropCat.5
## 1 -0.3379028 -0.1938666 -0.2885875 -0.03667022 -0.1239964 -0.2579915
## 2  1.1750949  0.6741928  1.0035952  0.12752481  0.4312115  0.8971943
##    PropCat.6  PropCat.7   PropCat.8  PropCat.9  PropCat.10 PropCat.11
## 1 -0.1622671 -0.0955427 -0.09287433 -0.1610016 -0.03283336 -0.1020414
## 2  0.5643019  0.3322604  0.32298088  0.5599011  0.11418168  0.3548603
##   PropCat.12  PropCat.13  PropCat.14  PropCat.15
## 1 -0.1073432 -0.04865676 -0.03397851 -0.08876794
## 2  0.3732981  0.16920932  0.11816407  0.30870044
## 
## Clustering vector:
##   [1] 1 2 2 1 1 2 1 1 2 1 2 1 2 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1
##  [36] 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 2 1 2 2 2 1 1 2 1 1 1 1 1 2 2 1 1
##  [71] 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 2 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1
## [106] 2 2 1 1 1 1 1 1 2 2 1 2 1 2 1 2 2 1 2 2 2 2 1 1 2 2 2 1 1 1 1 1 1 1 2
## [141] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 1 2 1 2 1 1 1 1 1 2
## [176] 1 2 1 2 1 1 2 1 1 2 1 2 2 2 1 1 2 1 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 1 2
## [211] 1 1 2 2 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1
## [246] 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 2 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1
## [281] 2 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 2
## [316] 2 1 1 2 1 2 1 2 2 1 1 2 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1
## [351] 1 1 1 1 1 1 2 1 1 2 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 2
## [386] 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1
## [421] 2 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [456] 1 1 2 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1
## [491] 1 1 1 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1
## [526] 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 2 1 1 1 2 1 1 1 1 1 1
## [561] 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1
## [596] 1 1 2 1 1
## 
## Within cluster sum of squares by cluster:
## [1] 9329.794 9330.351
##  (between_SS / total_SS =  11.0 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"
fviz_cluster(k2, data = BS)

k2center <- as.data.frame(k2$centers)
colnames(k2center) <- c("No.Br","Br.Runs","Total.Vol","No.Trans","Value","Trans.Br.Runs","Vol.Tran","Avg.Price","PVNP","PVP6","PVOP","Br.57.144","Br.55","Br.272","Br.286","Br.24","Br.481","Br.352","Br.5","Br.999","Pt1","Pt2","Pt3","Pt4","Pt5","Pt6","Pt7","Pt8","Pt9","Pt10","Pt11","Pt12","Pt13","Pt14","Pt15")
cluster <- matrix(c("1","2"),nrow = 2)
k2center <- cbind(cluster,k2center)

Purchase summary over the period, k = 2

ggparcoord(k2center, columns = 2:9, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Purchase within promotion, k = 2

ggparcoord(k2center, columns = 10:12, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Brandwise purchase, k = 2

ggparcoord(k2center, columns = 13:21, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Price categorywise purchase, k = 2

ggparcoord(k2center, columns = 22:25, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Selling propositionwise purchase, k = 2

ggparcoord(k2center, columns = 26:36, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Purchase behavior and Basis of purchase, k = 4

set.seed(120)
k4 <- kmeans(BS[,c(12:46)], centers = 4, nstart = 25)
k4
## K-means clustering with 4 clusters of sizes 52, 159, 351, 38
## 
## Cluster means:
##   No..of.Brands Brand.Runs Total.Volume No..of..Trans       Value
## 1    -0.3178125 -0.5550818    0.8179290    0.06954917 -0.08493788
## 2     0.7435888  0.9754662    0.3226669    0.96632396  0.65925598
## 3    -0.2713724 -0.3531863   -0.5144466   -0.46539726 -0.50279058
## 4    -0.1698070 -0.0596440    2.2824841    0.16032557  2.00196217
##   Trans...Brand.Runs   Vol.Tran   Avg..Price Pur.Vol.No.Promo....
## 1         1.35462582  0.7507119 -1.338110210            0.8055906
## 2        -0.20919640 -0.3438857  0.514423676            0.2621991
## 3        -0.11532150 -0.1972378  0.008589846           -0.4838254
## 4         0.08682978  2.2334547 -0.400701833            2.2695356
##   Pur.Vol.Promo.6.. Pur.Vol.Other.Promo.. Br..Cd..57..144  Br..Cd..55
## 1       -0.03587362             0.5481555     -0.42588824  2.78575343
## 2        0.38892339             0.2020246     -0.11126248 -0.30328607
## 3       -0.21838677            -0.2361182     -0.06576176 -0.27053370
## 4        0.43895702             0.5855657      1.65577105 -0.04419375
##   Br..Cd..272 Br..Cd..286 Br..Cd..24 Br..Cd..481 Br..Cd..352  Br..Cd..5
## 1 -0.27370339 -0.16842017 -0.1381482 -0.17357659 -0.21706761 -0.1699839
## 2  0.22597542  0.01829106  0.4568449  0.03579013  0.01987912  0.4253193
## 3 -0.08119801 -0.04780401 -0.1648062 -0.08663000  0.02897091 -0.1531229
## 4  0.17902592  0.59549419 -0.2002015  0.88796007 -0.05373823 -0.1326441
##   Others.999   Pr.Cat.1   Pr.Cat.2    Pr.Cat.3    Pr.Cat.4   PropCat.5
## 1 -0.4318378 -0.5442678 -0.5914525  2.77808631 -0.07670589 -0.53360466
## 2  0.5919819  0.9322732  0.1553378 -0.30349931 -0.07144406 -0.04596878
## 3 -0.3849787 -0.3535908 -0.2198827 -0.27093927 -0.08855764 -0.18461162
## 4  1.6699460  0.1100221  2.1904122 -0.02906355  1.22189589  2.62776730
##     PropCat.6   PropCat.7  PropCat.8   PropCat.9  PropCat.10  PropCat.11
## 1 -0.05833796 -0.35777868 -0.4114255  0.05546797 -0.25050860 -0.18733674
## 2  0.46501829  0.37789106  0.4044906  0.35105963  0.34084122  0.04807838
## 3 -0.24130847 -0.09717295 -0.1398660 -0.18987208 -0.11606667 -0.08507712
## 4  0.36302474 -0.19401267  0.1624500  0.20900748 -0.01126065  0.84102937
##   PropCat.12 PropCat.13  PropCat.14 PropCat.15
## 1 -0.1129087 -0.1721924  2.79384348 -0.1527383
## 2  0.3570557  0.4759960 -0.30278794  0.3052338
## 3 -0.1672040 -0.1707809 -0.27083693 -0.1020832
## 4  0.2049469 -0.1785600 -0.05454774 -0.1252252
## 
## Clustering vector:
##   [1] 3 2 1 3 3 2 3 1 4 3 2 2 4 3 3 3 2 3 3 3 3 1 1 3 3 2 3 3 3 3 1 3 3 1 1
##  [36] 3 3 3 1 3 3 1 1 3 4 3 3 3 3 2 1 3 3 4 1 4 1 2 1 3 2 1 1 3 2 1 1 2 3 3
##  [71] 3 3 1 3 3 3 3 3 2 3 3 1 1 3 3 3 3 4 3 1 2 1 1 2 2 2 1 2 3 3 3 3 1 2 3
## [106] 2 2 3 2 3 1 3 3 2 4 2 2 3 2 3 2 2 3 2 4 1 4 3 3 2 2 2 3 3 3 2 3 3 3 4
## [141] 3 1 3 3 2 1 3 3 3 3 3 3 3 1 2 2 2 3 3 1 1 1 4 2 4 2 4 3 2 3 3 3 3 1 4
## [176] 3 4 1 1 3 3 4 2 3 2 3 4 2 4 2 3 2 3 3 3 3 2 3 2 3 3 4 4 3 1 3 3 4 3 2
## [211] 3 3 2 3 4 3 3 3 1 2 3 4 1 3 3 3 3 2 3 3 1 4 1 3 3 1 1 1 1 3 2 2 3 3 1
## [246] 3 3 3 3 3 3 2 3 2 3 3 3 1 4 3 2 3 2 3 2 3 3 3 3 2 4 4 3 3 3 3 2 2 3 3
## [281] 4 3 3 4 2 3 3 3 2 2 2 3 3 3 3 3 1 2 2 2 3 3 3 2 3 3 2 3 3 2 2 2 2 2 4
## [316] 2 3 3 2 2 1 2 4 2 3 3 2 3 3 3 2 3 3 3 3 3 2 2 3 3 3 3 4 3 3 3 2 2 3 3
## [351] 3 3 2 3 2 3 2 2 2 2 3 3 2 2 2 2 3 2 3 3 3 3 3 3 1 3 3 3 2 3 3 3 2 3 2
## [386] 3 3 2 3 3 2 2 3 3 2 3 2 3 3 3 3 2 3 2 3 3 3 3 2 2 3 3 2 3 3 3 2 3 3 3
## [421] 4 3 3 3 3 2 3 3 2 3 4 3 3 3 3 3 2 2 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3
## [456] 3 3 2 2 2 2 3 3 2 2 1 3 3 3 3 3 3 3 3 2 4 2 2 3 2 3 3 2 2 3 3 3 3 2 3
## [491] 3 2 3 3 2 2 3 2 2 3 3 2 3 3 3 3 3 3 2 2 3 3 3 2 3 3 3 2 3 3 3 3 3 4 3
## [526] 3 3 3 2 3 3 3 2 3 3 3 3 3 3 3 3 2 2 2 2 3 3 3 3 2 3 3 3 2 3 3 3 3 2 2
## [561] 3 3 2 2 2 3 3 3 3 2 3 2 3 2 4 3 2 3 3 3 3 3 2 3 3 2 2 3 3 2 3 3 3 3 3
## [596] 3 3 4 3 3
## 
## Within cluster sum of squares by cluster:
## [1] 1180.009 6396.420 5146.643 3181.195
##  (between_SS / total_SS =  24.1 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"
fviz_cluster(k4, data = BS)

k4center <- as.data.frame(k4$centers)
colnames(k4center) <- c("No.Br","Br.Runs","Total.Vol","No.Trans","Value","Trans.Br.Runs","Vol.Tran","Avg.Price","PVNP","PVP6","PVOP","Br.57.144","Br.55","Br.272","Br.286","Br.24","Br.481","Br.352","Br.5","Br.999","Pt1","Pt2","Pt3","Pt4","Pt5","Pt6","Pt7","Pt8","Pt9","Pt10","Pt11","Pt12","Pt13","Pt14","Pt15")
cluster <- matrix(c("1","2","3","4"),nrow = 4)
k4center <- cbind(cluster,k4center)

Purchase summary over the period, k = 4

ggparcoord(k4center, columns = 2:9, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Purchase within promotion, k = 4

ggparcoord(k4center, columns = 10:12, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Brandwise purchase, k = 4

ggparcoord(k4center, columns = 13:21, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Price categorywise purchase, k = 4

ggparcoord(k4center, columns = 22:25, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

Selling propositionwise purchase, k = 4

ggparcoord(k4center, columns = 26:36, groupColumn = 1, showPoints = TRUE, alphaLines = 0.3)

By comparing k2, k3 and k4, I think k3 is the best segmentation. Because the clusters from k2 is too broad to achieve the purpose of market segmentation, and two clusters cannot properly characterize the data. And the clusters from k4 is too narrow for market segmentation, some clusters have similar characteristics.

Features of cluster 1: Low customer loyalty, low sensitivity to discount, relatively average purchase rate of different brands, no specific requirements on the price and type of goods, low number of purchased brands, low continuous purchase volume, low total purchase volume, low average single purchase volume, low total purchase value but high average purchase price

Features of cluster 2: High customer loyalty, low number of brands purchased, high purchase volume for specific brands, high purchase volume for certain types of products and certain price level, different sensitivity to different discount, low number of consecutive purchases, high average purchase volume but low total value

Features of cluster 3: Medium customer loyalty, high sensitivity to discount (increase purchases for specific discount), strong purchase desire, medium purchase volume of each brand, high number of brands purchased, high total purchase volume and total value, no specific requirements on the price and type of goods, tend to consecutive purchase